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We investigate course registration data of 18 semesters at a Korean University to portray the time evolution of 
students' positions in the network of fellow students. Apart from being a study of the social positions of students, 
the present work is also an example of how large-scale, time resolved, affiliation networks can be analyzed. For 
example we discuss the proper definitions of weights, and propose a redefined weighted clustering coefficient. 
Among other things, we find that the students enter the network at the center and are gradually diffusing towards 
the periphery. On the other hand, the ties to the classmates of the first semester (still present at the university) 
will, on average, become stronger as time progresses. 



[. INTRODUCTION 

Networks constitute, along differential equation models and 
cellular automata, a fundamental framework for analyzing and 
modeling complex systems ilUfi UlTl) . The advent of modern 
database technology has greatly vitalized the statistical study 
of networks. Networks of electronic communication, genetic 
interaction, hyperlinked web-pages, and so on, are available 
in sizes up to hundreds of millions of vertices (5), to be com- 
pared with the data sets of a decade, or so, ago that mostly 
were curated manually edge by edge. The sizes of these com- 
puter compiled data sets put new demands on algorithms and 
analysis methods — an 0(N 5 ) algorithm, where N is the num- 
ber of vertices, may work perfectly to analyze a food-web 10) 
but would be intractable for analysis of the contacts in an In- 
ternet community 10). On the other hand, larger sizes rids 
the data of finite size biases, and allows one to extrapolate the 
conclusions to the large size limit. These new available data 
sets have created a new sub-field of network-sociology yj) 
and are the main reasons statistical physicists (traditionally 
working in the large size limit) have been joining this inter- 
disciplinary field. 

In the present paper we use course registration data from 
the mid-size Korean University, Ajou University, located in 
Suwon, Republic of Korea. Our data set consists of lists of 
undergraduate students registered to courses for 18 semesters 
(two semesters per year), starting with the spring semester 
1996 and ending with the fall semester 2004. The basic net- 
work of a particular semester is an "affiliation network" where 
students and courses are two separate sets of vertices and 
edges link students to courses to which they are registered. 
From such a "two-mode" network (a network with two classes 
of vertices) one can make a "one-mode" projection to the set 
of students (or courses), where one student (course) is con- 
nected to another if they have a link to the same course (stu- 
dent) in the affiliation network. In this paper we only con- 
sider projections to the set of students and try to answer how 



a student's time at the university can be characterized by net- 
work statistics. Some basic statistics of the network of the fall 
semester 2003 is given in ( 19). Apart from allowing us to view 
Korean students' time at the university, and maybe university 
students in general, we introduce new structural quantities and 
methods of analyzing time resolved affiliation network data. 



II. CONSTRUCTION OF THE NETWORKS 

From our lists of course-registration data we would like 
to construct networks where an edge between two persons 
means that these two students are likely to be acquainted. We 
will consider both weighted and unweighted networks. For 
the weighted networks, we want a high weight to represent 
a high probability of the two students being acquainted. If 
two students take a course with few attendees they are more 
likely to know each other than if the course is large. Fol- 
lowing Ref. ( 16) we let a specific course m contribute to the 
weight between two students taking it with the inverse of the 
total number of students taking m, l/n(m). But if two stu- 
dents take many courses in common, the chance of them be- 
ing acquainted is larger. To account for this we just sum the 
contribution of each individual course 
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where C is the set of all courses, d m (i, j) is unity if both i and 
j take m and i + j, and zero otherwise. Now, as mentioned, 
our data is time resolved, which gives us a possibility of ac- 
counting for the cessation of acquaintances — two persons are 
less likely to feel acquainted if they took a course together 
three years ago, than if they were in the same class the last 
semester. The weight at a given time t can thus be written as 
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FIG. 1 The number of vertices N (a), edges M (b) and the total weight T (c) in the networks. The instantaneous networks are the giant 
components of the projection of the course-student network for a given semester onto the set of students. The accumulated networks are giant 
components of the one-mode projection of accumulated course-student networks onto the sets of currently active students. An accumulated 
course-student network contains all courses in the data set given the specified semester or earlier. 
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FIG. 2 The average degrees for students studying four, seven, and nine semesters, (a) and (b) show the unweighted and weighted results 
respectively for instantaneous networks, (c) and (d) show the unweighted and weighted results respectively for accumulated networks. 



where p(t) is a non-increasing function that accounts for the 
decay of friendships over time and n(m, t) is the number 
of students taking the course m at the semester f, and 5 m (i, j, f) 
is the corresponding generalization of 6 m (i, j). We will use 
the two simplest decay functions: Either we let p(f) decay as 
fast as possible and thus be unity for t — and zero for t > 
giving an instantaneous network, or we let p(f) not decay at 
all (be constantly unity) and obtain an accumulated network. 

Some of the quantities we will use are based on shortest 
path lengths. Since these are not defined in a general net- 
work we reduce our graphs to their giant components (largest 
connected subgraphs). In these, the numbers of vertices are 
95-100% of the original networks. Another technicality con- 
cerns the temporal boundaries — we do not know how long a 
student of the first semester of our data has been at the univer- 
sity, and we do not know how long a student present the last 
semester will stay. We handle this problem by not including 
the students present the first and last semester in the averages 
of our quantities. There are, of course, students who are on 
leave from the university a few semesters. We estimate around 
~ 15% of the data points are students who have returned from 
a break. How to treat these students is a dilemma: On the one 
hand, one would like the number of active semesters to be the 
measure of time (in studying the time evolution of quantities). 
On the other hand, one cannot exclude the returning students 



from the network, after all they are a part of the network after 
their return. We will choose the latter alternative, and treat a 
student on leave as still present at the university, but with zero 
degree. The sizes of the networks are presented in Fig.^ We 
note that the accumulated networks sometimes has a higher N 
than the instantaneous counterparts. This is because the ad- 
ditional edges of the accumulated networks can give a larger 
giant component — the number of edges in the projection to the 
set of students is the same. The jagged shape of Fig. ^a) is 
an effect of that more students enter the university the spring 
semester (~ 1957 on average) than the fall semester (~ 353 
on average). Assuming the half of students stay an odd (or 
even) number of semesters the spring semesters would expect 
a difference of (1957 - 353)/2 » 802 between the spring and 
fall semesters. That the actual average difference is smaller 
(~ 635 on average), is an effect of that more students take an 
even number of semesters since many programs is for an even 
number of semesters. 



III. THE MEASURED STATISTICS 

In this section we will present the quantities we measure, 
the values we obtain and interpret these. All our curves are 
functions of the time (number of semesters) Af a student has 
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studied at the university. To avoid, as much as possible, that 
behavioral differences get averaged away, we measure the 
curves for students present at the university a fixed time. As it 
turns out, these curves are qualitatively similar (but sometimes 
differs quantitatively). 

A. Degree and weight 

A fundamental vertex-quantity of unweighted networks is 
the degree, k(j), defined as the number of edges leading to a 
vertex i. Degree gives a measure of how central a vertex is in 
its local surrounding (and is therefore sometimes referred to 
as degree centrality (1221) '). The straightforward generalization 
of degree to weighted networks is (22): 

K(i) = J]w(i,j). (3) 

i 

In Figs. |3Ja) and (b) the degree, in its unweighted and 
weighted versions, for instantaneous networks is plotted for 
students staying nine semesters at the University. Other 
lengths of study give the same general shape of both (k) and 
(k w ) — an early decrease of (k) that flattens out, and a rather 
constant (k w ). This decrease is probably a result of students 
taking increasingly specialized courses, in smaller and smaller 
classes. Results for accumulated networks are displayed in 
Figs. |3c) and (d). These have maximum around the fourth 
semester. Note that if none of student A's fellow students of 
the first semester leave before A, then A's degree would be 
strictly increasing in the accumulated networks. The decreas- 
ing part is a result of old neighbors (vertices one edge away) 
exiting the network. 

B. Relation to fellow freshmen 

To get a picture of how the ties come and go in our net- 
works, we study the relation of a student to her, or his, fellow 
freshmen (defined as the neighborhood of a student the first 
semester at the university). In Fig. [3 a) we display the aver- 
age fraction of fellow freshmen remaining after t semesters (p 
for students who stay four, seven, and nine semesters at the 
university. (f> is exceptionally similar for number of semesters 
at the university. 

Students coming and leaving the University is of course not 
the only dynamic mechanism present. The choice of courses 
can make students drift apart or get closer. This can be mea- 
sured by graph distances. The distance of a path P between 
two students, in the unweighted network, is the number of 
edges in it, whereas for the weighted network it is the sum of 
reciprocal weights ( flrjl) : 

d(i,j) = min \P\ (4a) 

PeP(i,j) 

d w (i, j) = min V ] , (4b) 
PenU) w(i',f) 

where j) is the set of paths between i and j. In Fig. |5Jb) 
and (c) we plot the average shortest distances between a nine- 
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FIG. 3 Statistics about the fellow students of the first semester, (a) 
shows the fraction <p of students of the first semester that still are stu- 
dents of the university, (b) and (c) display the distance to the neigh- 
borhood of the first semester of students staying nine semesters in 
total (unless otherwise stated), (b) shows the unweighted version do 
and (c) shows the weighted counterpart d% . All errorbars are smaller 
than the symbol size. 

semester student and her, or his, fellow freshmen still present 
in the network for unweighted do and weighted di* distances 
respectively. The unweighted distances are strictly increas- 
ing. So one's fellow students of the first semester does not 
only become fewer in number but also further away in dis- 
tances of binary networks. In the authors' personal expe- 
rience, the acquaintances with some of the fellow freshmen 
grew stronger with time. This phenomenon is visible for the 
weighted distances of Fig. 0c). Interestingly, dg of the in- 
stantaneous network has both a maximum and a minimum in 
the interior of the At interval. We believe that this is because 
the students choosing different courses than the nine-semester 
students (and thus causing the early growth of dT) are often 
leaving the university sooner than the nine semester students. 
This picture is supported by the monotonous growth seen for 
four-semester students. To calculate weighted distances for 
all vertices is the computationally most demanding part of our 
analysis, requiring 0(MN\og(N)) time with a Dijkstra's algo- 
rithm implemented using a binary heap for storing the subse- 
quently shortest distances ( flr3) . 

C. Closeness centralities 

In Sect. II1I. A1 we discussed the behavior of degree, a quan- 
tity measuring the centrality of a student in the local surround- 
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FIG. 4 Average closeness centralities of students who stay at the 
university nine semesters in total, (a) shows the unweighted version 
do and (b) shows the weighted counterpart d Q v . 



ing. This section concerns a global centrality measure. The 
study of global centrality in graphs dates back to the 19th cen- 
tury Slut . There are many different notions of centrality, and 
thus, many different measures, each trying to quantify a cer- 
tain aspect of centrality (I22I) . One of the simplest is the close- 
ness centrality the reciprocal average shortest distance to 
all other vertices: 



C c (i) = 



N- 1 



(5) 



(For weighted networks d(i, j) is replaced by d w (i,j).) In 
Fig. 0] we plot Cc and C£' for the nine-semester students. The 
impression this figure gives us is a slowly decreasing central- 
ity. Other lengths of the university stays are not less shaky, but 
all have a general downward trend. This leads us to the con- 
clusion, that as one starts the university, taking general courses 
in big classes one is more central in the network of students 
than when one takes more specialized courses later in one's 
university education. It is worth noting that this holds even 
for the weighted accumulated networks — it does not matter 
that one gets closer to one's fellow freshmen (as discussed 
above), one still gets increasingly peripheral in the network as 
a whole. From the point of view of the educator, this analysis 
confirms what we already know: that students are formative 
mainly in the first semesters, when they are embedded in the 
short distances at the core of the course-affiliation network. 
After the first semesters, students' identities are already be- 
ginning to shape and you can merely add to the already given 
foundation. From that point it is unusual that students make 
dramatic shifts. 



D. Clustering coefficients 

Acquaintance networks are believed to have an overrepre- 
sentation of strongly connected triads Q fT5) . I.e., if A is 
strongly connected to B and C, then B and C are also likely to 




FIG. 5 Average clustering coefficient of students are a total of nine 
semesters at the University, (a) shows the unweighted version (c) and 
(b) shows the weighted counterpart (c w ). All errorbars are smaller 
than the symbol size. 



have a strong connection. A quantity to measure the strength 
of connections within i's neighborhood is the local clustering 
coefficient (23): 



c(i) = e{Ti)j\ 



k(i) 



(6) 



where e(T,) is the number of edges within i's neighborhood F,. 
The time evolution of clustering coefficients for nine-semester 
students is plotted in Fig. [51a). We believe the overall decrease 
reflects that the choice of courses a student take is increasingly 
specialized and individual (which of course decrease the frac- 
tion of fellow students themselves taking the same course). 

The decreasing sizes of the classes as a student progress 
to more specialized subjects makes the weights between the 
students stronger. Consequently one can argue that the tri- 
ads become stronger and the clustering coefficient should be 
adjusted to reflect this. It turns out that the generalization if 
the local clustering coefficient to weighted networks is not en- 
tirely trivial. We would like such a measure to fulfill the fol- 
lowing requirements: 

1. The values should lie in the interval [0, 1]. 

2. If the weight matrix is replaced by a 0,1-adjacency ma- 
trix the Watts-Strogatz clustering coefficient should be 
recovered. 

3. Zero weight should consistently represent the absence 
of an edge. 

4. The contribution from one triad including the vertex i 
to i's weighted clustering coefficient c w (i) to be propor- 
tional to the weight of each edge in the triad. 

Unfortunately, none of the proposed definitions of a weighted 
clustering coefficients we are aware of B 1 1 3c ll 81) fulfills all 
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FIG. 6 Conditional uniform graph tests of do (a) and (Cc) (b) for 
nine-semester students. 



these requirements. We note that an alternative way of writing 
the Watts-Strogatz clustering coefficient is 

... I,jkaijaj k a k i Al 

C M = ~? = 77T77" ' ( 7 ) 

Ljk a ij a ki (AlAjg 

where 1 is the matrix where all elements are 1. In this rep- 
resentation the generalization from an adjacency matrix to a 
weight matrix is straightforward 

... _ Y,j k WijWj k W ki _ 
Cw W ~ ? ~ Tnrnr ' 

max; j Wij 2, Jk wijW ki i w w 

max »» Jii 

where W max is the matrix with max/yWjj on all positions, 
which indeed fulfills all four requirements above. The 
weighted clustering coefficient counterparts of Fig. a) is dis- 
played in Fig. 0b). We see that, for the instantaneous net- 
works, the increasing strength of the triads counterbalances 
the effect of more personalized curricula, so the (C c ) curves 
are flatter. 



graph test' and compare X(Gq) with the value of X aver- 
aged over graphs with Y conditioned to F(Go). In this paper 
we will perform two simple conditional uniform graph test: 
First we rewire the two-mode course-student network before 
making the projection to the student network, and construct- 
ing the weights (we call this RT randomization, mnemonic 
for "randomize two-mode"). We treat multiple course reg- 
istrations as a result of the randomization as just one course 
registration. We also rewire the one-mode projections (RR 
mnemonic for "randomize projection"). This gives Poisson 
random graphs of the same sizes as the real networks. For this 
networks we average the results over ten randomizations. 

In Fig. |SJa) we plot the average distance to the fellow fresh- 
men for the RT and RP randomizations. We see that the ran- 
domized curves are approximately constant (apart from the 
RT curve where do(Q) — 1 by definition). The distances in 
the real networks are strictly shorter than in the randomized 
networks. This is completely consistent with a picture of stu- 
dents of similar subject being closer than students of differ- 
ent subjects — even if students are not classmates after a few 
semesters they are likely to take courses of similar subjects 
and thus be closer than an arbitrary other student. 

Fig. 0b) shows the closeness centrality of nine-semester 
students for real and randomized networks. We see that the 
students have considerably lower centrality values than the 
vertices of the test networks, and that the downward trend 
is stronger for the original values. That the closeness is on 
average smaller in the real networks reflects that the average 
distances are larger than in the randomized networks. This is 
logical if one assumes the student networks can be described 
as groups of students majoring in similar subjects. Both ran- 
domizations remove the tendency of students close to grad- 
uation take courses with few other students, which explains 
the stronger decline of the real-world curves. The downward 
trend of the randomized curves is explained by the increasing 
average distances of the random networks due to the increas- 
ing number of students. 



V. SUMMARY AND DISCUSSION 



IV. CONDITIONAL UNIFORM GRAPH TESTS 

In this section we put some the results of the measurements 
in perspective by comparing them to results for graphs with 
some network structure averaged away. A thorough analysis 
of this kind, to establish the interpretations we propose above, 
would extend the size of the paper so much that we omit it. In- 
stead we perform case studies of two of our unweighted quan- 
tities, do and (Cc). 

The standard way to view a complex network is to say it 
is to some extent random and that it also has some degree of 
structure — deviations from complete randomness induced by 
the forces forming the network (I7tl20l). The commonly used 
structural measures are not independent. To sort out if a given 
quantity X(Go) (of a graph Go) is a dependent on a certain 
other quantity Y(Go) one can perform a "conditional uniform 



We have analyzed a data set from a Korean university con- 
sisting of course registration lists for 18 semesters. These lists 
are made into weighted and unweighted networks of students. 
The weight between two students is a sum over all courses 
they have taken together. We argue that, if one wants the 
weight to represent the probability of an acquaintance along 
an edge, then the contribution to a weight from one particu- 
lar course should be chosen as inversely proportional to the 
number of students taking the course. Furthermore, an old 
course should contribute less to an edge than a newer. We 
use two decay functions for the weights, one constant (mini- 
mally decreasing) — defining accumulated networks, and one 
zero for any course earlier than the present semester — defining 
instantaneous networks. An unweighted edge is defined to be 
present whenever the corresponding weight is non-zero. The 
quantities we measure, all in both weighted and unweighted 
versions, and all as function of the time a student has been 
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present at the university are: Degree and closeness central- 
ities; distances to fellow students of the first semester; and 
clustering coefficients. Some conclusions are strengthened by 
conditional uniform graph tests. 

We find that students enter the university in the center of 
the student network, as a student progresses she, or he, will 
slowly become more peripheral. On the other hand, the ties 
with fellow students becomes stronger over time (to come 
to this conclusion one needs weights, but the decay-function 
does not matter). The connectedness of the neighborhood, as 
measured by the unweighted clustering coefficient, decreases 
with time. We argue that this should not be interpreted as the 
neighborhoods of students becomes weaker, also here the con- 
clusion from the weighted quantity is more sound — that the 
triads of the instantaneous networks have roughly the same 
strength over time. 

From a qualitative point of view, interpreting our course- 
registration network, and affiliation networks in general, is 
problematic in the sense that it connects individuals with each 
other indirectly, and not through directly observed social in- 
teraction. This means we do not know if being in the same 
course also fosters significant social interaction or if it breeds 
friendship for example. From personal introspection only can 
we confirm that it sometimes did and it sometimes did not. 
Still, it is perfectly reasonable to assume that university course 
affiliation is an important identity shaper for a university stu- 
dent, and therefore the analysis bears considerable sociologi- 
cal relevance. In free societies most adult people join schools, 
clubs, organizations, etc. by choice. However, once affiliated 
one is under the influence of everything that goes on in that 
particular arena (25). At the university, most, but not all, of 
the students in a course are literary changing in front of our 
eyes in the duration of a course. Indeed, that is one of the re- 
wards of teaching. And, once your students are finished with 
your course, they go on to other courses that have been chosen 
by them partly under the spell of whatever took place within 
the boundaries of that particular course you were offering. In 
this respect the analysis provides an illustration of the iden- 
tity shaping processes that takes place, not only in university 
courses but in every social arena with which we are affiliated. 
Within the limits of course offerings, the network is shaped by 
this identity seeking on behalf of the students. Because indi- 
vidual identity is largely defined by exclusion of other iden- 
tities Q |2 II) it is not surprising that we see a growing frag- 
mentation and distance between students as time passes by. 
In network terms, university life as uncovered in the analysis, 
begins at the core and drifts to the periphery. In terms of iden- 
tity shaping, the track is the opposite — students start off blank 
in scholarly identity and gradually shapes into quasi-experts 
of their majoring subject. And as the semesters pass, groups 
form around fellow students that are following the same iden- 
tity path, i.e. students develop similar university identities. 
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